A project that I am working on will require the storage and retrieval of several million images. The current idea for managing these images is to create a directory, dump 3000 images into it. Then create a database record that has the name of the image and the path to that image. Repeat when new images are added. Each million images would require 334 directories. Speed of retrieval needs to be fast and highly scalable. Is there a better way? posted by Mr_Zero to Computers Internet (14 answers total) 2 users marked this as a favorite The database solution is BLOB columns. Which is usually implemented like what you describe but within the DB. There is a main table that just has a index representing the BLOB column, and then a separate helper table that has all the binary data. Main table is still fast because it is small. posted by smackfu at 10:58 AM on March 30, 2007 The dir based solution sounds workable. The db part might not even be needed if there is enough metadata in the name. I've set up similar issues with similar number of files before. If there is enough info, a series of hashed directories works well. But that depends on how "hashable" the filenames are. Keeping the max number of items (be it sub dirs or files) in each dir fairly small (say, 500 or so) will help speed on many filesystems. But it sounds like that was more or less already in the plan. How big of images are you talking about? If you are talking about huge images and ludicrous speed requirements, a specialized file system might be needed. Some of the ideas in say, mogilefs might be useful Replica Omega Planet Ocean, depending on your particular needs. posted by alikins at 11:16 AM on March 30, 2007 I'd say for most scenarious this is a reasonable, fast way to go. Store some metadata in the DB and let the filesystem handle the rest. You will probably want a small heirarchy of directories instead of one level. Put your DB on a RAID 5 and your images on a RAID 0+1. But as usual the best way depends on the particulars: how many images do you need to scale to? 10M? 100M? 1B? What is the size distribution of the images are they all thumbnail sized (10s of KB) replica hublot Big Bang Steel 41mm, or are they all "large" (1MB+ish)? What will your access patterns be completely random, or will you usually access them in some order? How often, if ever, will you modify the collection or add to it? Is it always growing? How important is fault tolerance and maximum uptime? posted by ldenneau at 11:18 AM on March 30 Replica Cartier Pasha, 2007 Don't forget that a huge advantage of the BLOBs in Database approach is that the backup and restore of your solution is just the database. You don't have to keep database and file system backups in sync. Also if you ever wanted to scale your solution to multiple web servers/multiple database servers, you may find that the built in support for clustering in your database product might have a whole lot less corner cases then your home grown file system database system + the real RDBMS. posted by mmascolino at 11:22 AM on March 30, 2007 And no, storing lots of binary data in the database in a single table doesn't necessarily mean it's going to be slow. Lookup will be indexed Replica Breitling Chronospace, and done via a simple ID field. That way the database doesn't care what's in the blob column until it finds the one it wants. I know that MSSQL stores TEXT columns (textual blobs really) in seperate disk pages than the rest of the row. That means if you don't select on it, it doesn't even consider it part of the row. That's useful when you want to pull up metadata about it, but not the text itself, you just make sure your select statement doesn't pick up the text column. posted by cschneid at 11:22 AM on March 30, 2007 Different filesystems are going to have different performance characteristics with large numbers of files. What are the performance parameters of your application? What sort of concurrency level for adding and viewing images? What sort of interface do you need to the data (HTTP? SMB, NFS?, arbitrary API in the language of your choice?) What sort of access control does your application require? I'm suspicious of the BLOB in database approach because I've been reading a lot about image storage approaches for web apps, and basically no one is going that route. But that might be the way to go if your storage needs are finite, and your concurrency levels are manageably low. replica breitling Bentley Motors http://www.asmallheaven.net/replica-cartier-roadster-disocunt-uk.html Replica Cartier Pasha
:: بازدید از این مطلب : 607
|
امتیاز مطلب : 0
|
تعداد امتیازدهندگان : 0
|
مجموع امتیاز : 0